DOCOHEMT BESOttB- % 



ED 161 913 

AUTHOR 
TITLE 

PUB DfiTE 
NOTE 



TH 007 978 



EDBS PRICE 
DESCRIPTORS 




Petrosko, Joseph H. 

Evolution of Educational Measurement in the 1970 •s: 
Changes in r Sietfentar7^ ; £eve'r' 'Standafdxz€^''Teisits^ 
Har .78 ' ; . ; ." ( ; _ . 

22pi ; Paper, present e'd at the Annual Meeting of the 
national Council on Measurement iiji Education^ 
(To'ronto, Ontario, Qanada, March, 1978) ; For 'related 
documents , see ED 0 44 446 and 1 43 , 670 ^ 

H^7$0.83*pc^$i^67 Plus postage. 
Affective Tests; Cognitive Tests ; Educational 
Practice; ^Educational Trends; Elementary Education; 
♦Evaluation Criteria; Evaluation Methods; 
* *Standardized Tests; *Student Testi^ig; Test 

' ■•. Reliability; *Test Reviews*; Test Wa^ity ' 

ABSTRACT vA> °. . ■ . V; V."'"" v'/ .; '■ \V.,V' . _ 

Test evaluation summaries completed Jf^^the Center, for 
the Study of Evaluation in 1970 and 1976 were used* to determine, 
changes in test quantity ajid quality among elementary- level 
standardized instruments. In the earlier studies, the instruments 
were rated in four general areas i, measurement validity ,/ examinee 
appropriateness, administrative usabilit-y, anjd normed technical 
excellence. Ratings covered critical indicators of test quality 
including reported validity and reliability tfnd quality 'of score 
.distribution.* To -determine changes in '-quantity* a cross- tabulation 
was constructed for 1970 and 1976 daita showing the number of tests 
available for 41 educational goal artas at each grade level. The 
qualitative analysis focused on testis of i- : attitudes, values aYid' * ' : . 
motivation; reasoning, arithmetic operations, and reading readiness. 
Quality ratings were compared for coricuirrent and predictive validity 
and test reliability. The number of tests evaluated in each , v 
educational goal area in 1970 ^aird 1976 are included, as well as'th^V- 
numbers and percentages : of test£ tated fbr validity ^nd reliability. ' 
Results indicated that the quantity of elementary <^level standardized 
tests increased greatly . and tjhat increases were proportionally; 
consistent within subject areas. However* despite an enormous growth 
in the number of tests, .a parallel growth in- quality did t not occur. 
(Author/JAC) ^ 




/'■■■ , ' 'it 



*************************:#********** *******^ 

* Reproductions supplied by EDRS are the best thatVcan be made v * 

* r -■— ~~~ from the original document. - Ay^ :, ">? * 

********************************************* 



U S, DEPARTMENT OF HEALTH, 
EOUCATION A WELFARE 
NATIONAL INSTITUTE OF 
EOUCATION 

! THIS DOCUMENT HAS BEEN REPRO- 
DUCED EXACTLY AS RECEIVED FROM 

J' THE PERSON OR ORGANIZATION ORIGIN-. 

: AJING IT. POINTS OF VIEW OR OPTniONS 
STATED DO NOT NECESSARILY RE PRE- • 
SENT OFFICIAL NATIONAL INSTITUTE 05? 
EDUCATION POSITION OR POLICY. ... 



• r-ri 



Evolution of Educational Measurement 
;. in the 19/0 f s-: Changes in 
Elementary Level Standardized'. Tests 



■ ■ ..." {•■ > 

"PERMISSION TO REPRODUCE THIS 
NJfcTERIAL HAS BEEN GRANTED BY - 

.* " ' ' — <— 7* ~ 

TO THE EDUCATIONAL. RESOURCE? 
INFORMATION. CENTER (ERIC) AND 

*} USERS OF THE ERICSYSTEM." 



0 



.,'tS: 



Joseph M. Petrosko 

* ■ 

'i University .of Louisville- 



00 



V 



Pa^er presei^fed lit v the Annual Meetirig ^ the 

;l ..National Council on Measurement in Education 

■ * * ■ ■ ■ 

■ , Torofito, March 1978 %. '. * 

: ■ ■ "Printed in U.S. A; . .. 



ERIC 



There .have been ? mahy ^changes in educational measurement in 'the last 
several years. Changes have been' evident in such things as (a) the*., 
measurement philosophy embodied in instruments*, (b) the educational topics. " 

*~ ' ' ' ' ' - -S. '. •■ . ■ * ,: 

covered by "instruments, and (c) the efforts made to deal with problems of - 
te^ bias;. The testing field has been broadened by the expansion, of interest 
in criterion-referenced measurement. Theoretical concepts that were a gleam/ 
in a researcher's eye a decade ago have ^become, reality in; the form of .'. / 
commercially. available tests. New subjects in* the/curriculum have led to 
newly developed, tests , for. example, emerging areas, of education (e^g. , . ' 



occupational and career education, mora! education) have matured to such a 
degre^ that educators are now interested in assessing educational outcomes 
in /these areas . ■ ' ... „ . - ' . . 

Few ..would deny that it is a valuable enterprise to trace the evolution 
of Educational measurement. But exartly how ifhis shoiild be done is another 

*martter. The subject can be . approached .froiji many different angles The ■; 

^present study approached the problem by de^al-ing, with two maq or issues . First 
attention was focused on determining the areas, of the curriculum that have 
s^eeri ''an increase in testing "options .' In other words , -what changes have 

'-■■■"•■ r ; , *>* . .■ •' ' . . ■■■ * • - ■ 

occurred in ttie quantity of available tests and what curricular areas have 
b^en affected? Secondly, -the ;'S,tudy dealt with the- i,ssue of ^ test qualityt 
Irrespective'. of ythanges^ ih the number of tests,' has the measurement- sophis- 
tication of instriimen-ts changed?-. ; % \ > '■ , ■> . ■ ' \ 

\ y "It is li^pos^sible .in . a'.single st^dy to cover the entire measurement 
field. The . study wds ■ Iimiited to standardized tests ( aimed' , at elementary 
school students (i >;e.,. -grades ■ 1-6).'/ r To examine changes in tests, a unique ^ 
data base was used: ".test evaluation, summaries completed by the Center for 
the Study of Eypiuation. These test evaluations resulted from a large-scale 



• : : . • " • . ■ .v i • 

'. . • : . .• ■ .(•■ "\ i : 

pTOject involving a quality .assessment o£._all standardized tests available*. 

in the jJnited States-.. In the course of the project, tests were categorized :, 

• .V , • • ,•■ f ■ .. <* 

by . grade* level and educational goal area, and then tests were evaluated for 

^psychometric quality on^ over 20 criteria of excellence. 

• • . . • .•..'•*,' • ■ m • . ■ • ■ . ■ ■ . ■ ■■ 

. • Using, data assembled at the beginning of. the decade (Hoepfner, 
^ Stricklanfl, " StangeT, J-ansen, % Patalino, 1970) and data from the mWdle of 

. the decade" (Hoepfner, ' Bas*tone, Ogilyie, Hunter, Sparta, Grothej_Shani, 
JHufano, Goldstein,. Williams , § Smith, 1976), a comparison was made of changes 

— . : •• - : ; " > : • ; • *. ' . ', ■ . •■" ^ \ ■ r 

invest qijfanjity anil test quality among elementary-level standarized 

■ ,*...■* !**■ * /" i ' • . '. r ". • 

instruments. „ . : . . *\ ; ! \~ ■ 



Method 



Procedure . -\ \ % 

. The .procedure' f^r kcquirin^,. categorizing »fkd rating tests was the. 



same in 1970; and 1976A First, all commercially, available standardized tests 
" at the elementary; education level were ordered .from' publishers . Then the 
tests (including 1 the subtrests they contained) wer^. categorized by grade 
level and by ..educational goal area, (e.g., curriculum topics such as 
mathematics ' or reading). In 1970,, the grade levels were 1, 3, 5 and 6; in 
1976^ the levels^^re grades 3-4, .5-6. .For. both. sets of ratings, 41 

\ educational goal areas were used.' These covered the ejitire range - of ' 
'educational topics in the cognitive, affective and psychomotor domains.. . 
. Based on the content of its items (e.g., .arithmetic, reading, social" studies 
an instrument was categorized into a Articular goal area'. 

■.After being categorized, the test was Evaluated for quality. Each, 

.ihstrtqnent was rated oji more than 20 criteriavof excellence. The criteria 

' " ■ ' y ■ ' . ' • .j- • ' • s " ; . - " ; '' \" ' ' • \ ' '■: ' ": - 

were grouped into fcyir general areas : measurement validity, examinee . 

appropriateness, administrative usability and normed technical excellence.- 



'A 



3 



Ratings icovered critical %f indicators 'of test • quality, for example; reported * 
ity* arid reliability arid quality of score distribution.-; ; • / . ' ^;£."'-V\ • 
-Ratings^ were, independently performed by two trained -test Reviewers; 0, :\ 

" ' . v '. ■ '-iZ . . '*" 1 ' ■■" .'!■".•'•.■.■■>'•■■- ' 

es of disagreement between th& two./ a third E&ter adjudicated dis- ' 

! . m y';h\- m . ■ .... '. ' . • - ^ ; . v : . 

ntis. .All ", raters had the. same information abotrfe^ach test—a standard 
n -set consisting of the test itself and, in most cases, a technical 
manuaT or other type of supporting information. In assessing validity and : 
r^Skb'iJ.it)?>- only publisher -supplied information was used. No attempt *wa:s 
-to search through the research literature and find studies that. 




.employed a particular instrument. — — . , 

■ Of. n^pessity, the preceding, description of test evaluation procedures^ 
has been brief.. Complete details are available in Hoepfner, er al . (1970) and 
Hoepfner et al. .(1976). , 




Analysis ' . \ 
. * ' A^ straightforward analysis procedure was used to examine changes . in 
tests. To determine changes in the number of instruments,, a crosstabulation 
was constructed showing the number of tests .available for 41 educational goal 
areas at each separate grade "level. This was done both, f)pr^ 1970 and 1976 
.data. In the analysis of test quality, the study concentrated on jtests in 
several important educational areas: (a) tests of attitudes, values and 
motivation, .(b) test$ off reasoning,, (c) tests of arithmetic operations 
(i.e., computational ability) and, (d) tests of reading readiness. In each 
of these four areas, quality ratings were cqmpared for two sets of evaluation 
criteria: (a) concurrent and predictive validity, and (b) test reliability. 
For each^c>i^eridn ; the number and percentage of tests at each level of 
quality were recorded. • 

Some thought was given to using "inferential statistics ' to "test. • 



ERIC 



) 



\ L 



hypotheses, regarding differences in test ^qiiality between^ 1970 and 1976, l It 
was finally decided to fdrego such analyses "since ^he data used in the study 
can be reasonably assigned tQ represent populations of;. tests rather than 

f . : /;..• " * • > . 

samples from populations.' 7nferentialJ^?^s were, therefore,' not reported 

Result 



Changes in the Quantity of Tests 



J- 



I,n .the fif&jt part of the analysis, the. numbers of tests in the various- 
educational goal categories were compared for 1970 and 1976. It. was di's-, 

covered that there, was a substantial: iricre'&se in* the number of instruments, 

' ■■ • . . . : - • ' •■ •„ - - ., :■■ >■ ■ ; •■/. .. 

■ - : ' ^ ■ ' • ' ■ v V ' 

In 1970, a total of 1,686 test evaluations were completed; . in j!976,, the 

w--> ■ ■ / . ■ . f v / 

riumbei; had risen to 9^127. For both occasions when evaluatio ns wj^i. per- 
fotroed., jnoi^4^ests ^e^e fouiid. at the higher rather tharf the; lower.^rade 

. iW^lsr^-.J^ largest numbers of tests wdre located in areai£ of tWe curifi- 
culum; tfyat might be termed 4 the M 3 R's. M Educational goal are,as ^involving, 
reading, .writ ing>^and arithmetic had a large number of^tests. „Iri addition 
to iMese; importan of the cognitive and affective ^educational 

domains showed extensive^ cpyerage-:-gersonal temperament (e.g. tests of • * 

.emotional stability), attitudes, values arid motivation (e.g.,. tests of self- 

:.' ■ W •. • •.. ■ ... -•• •. 

esteem and attitude toward school),, and reasoning. (e,g. , tests of intelli- 



gence) \ 



/ 



",- \ • Table 1 shows the number of tests evaluated in 1970, Table 2 gives 

> . : ; -- U ■ •'. . . ■> • * ■' -. • • 

similar information for 1976. It should be 7 rioted that , the grade level^cater- 
$ gories differed some.ifljiat jf or the two. sets of data. ^J'so, the. educational 

goal categ'ofies were different,, but only ^slightly. Additions and deletions 

' ... V : ' ~~- : ...... 

were -made so that, the .1976 . goal list Reflected an up-to-date picture of. ' 

educational offerings at the elementary education level* For exampl^, goal , 

, '■ " ' . ' • . 

category 7 in. 1976, Career Values apd Understanding,; illustrated a new 



ERJC 



b \ 



A 



---- emphasis, on. career education that; now; 'extends ^ down .to /the elementary level. „ 

The data revealed a substantial - increase, in absolute niMbe^/^among 
- s almost all goal categories. -Some, educational .areas were not w ( eli represented 
^ . v-hy instruments qn both rating^j^casions?^ speaking, a 

v small nufobfcr pf^testsVin arts arfd crafts, foreign languag^education , mtis-ic, * 

v • ' * • • • . ' ' 1 • ' J ..." . v ■ "'■ • ; * * " * * 1 ■• 

' r^... ' .''science-, and social studies/. r : r * ! / : ^ • " 



ERLC 



v 



;.;.;a-.v^ 



V 

V 



. 7 



i ■ » ' 
f 



.V- 




» 



Table % t 



"Number of Element airy St'anclaydized 

'Tests Evaluated in 1970 *t ' r y "' '[ \ 



Grade 



Educational Go#l Area ' N . } 

1. :Temperament-Persbnal I " ■ 

2..'; TempeTjament- Social ' ; i >' - : 

3. ■ Attitudes 

4. • NeeclS and Interests v 

■ .i .• „ v : v • 

5. ■_; Valbing Arts (j and Crafts 

'* ■ 

7 6.* Producing Arts and Crafts v 

' : ( . - ; • ••• • • .' 

7. Understanding Arts and Crafts 

_ J Reasoning \ ■ . • ■ 

. . ;^ ■ ^ ■• • '.. ' ' . , ' 
9.) Creativity 

■ r - •■ 

10. Memory . V . ' ^ 

11. Foreign Language Skills •.. 

12. Foreign Language Assimilation 



13:. 


Language Construction 










14. v 


Reference Skills 




15. 


Arithmetic. Concepts^ 






Arithmetic Operations 




. 17. 


Mathematical Applications 


# 18. 


Geometry . ' 




!9- 


Measurement v 




. 20~ 


Music Appreciation and] 


Interest 


21. 


Music Performance 




22. 


Music Ijnderstandiiig 





.13 
15' 
4 . . 
0 

0 v 

t 

1 ' 
0 

si: 

10 
11 

0 ',' 
0 

12 

9 

5 

0 

"0 

0 

0 • 
0 : 



3 



6 



14 
17 

5 ■ 

0;' 

4 

2 
3 

43 

9 . 

9 - 
0 

0; 

34. 

4 

26 
2 
8 
0 
1 
0 
0 
0 



17 
20 
5. 
5 
6 
2 
3 
.50 
•' 10 
9 

.2 
0 

- , 42 
: ' 13 
19 
34. 
12 



'0 

2 \ 
'.0-j 

1 ■• '* 
21> 



24 

2 i : 

7. • 

23 

.6 

2 \ 
4 



47 



9 ~ 
10' 

2' 

6 

42 

14. 

19 



15 

2 
0 
1 

21: 



1 



9 

ERIC 



O 



Table" 1 (cont inued) r . ■ " * 




Grade 




.» 


- . •.. * : .. .■ - ■ ■ • , • v • 

CUUtdl. rJLUricLJ. V3UdJ. nlcd >. ■ . • ~ j 


1 

. .'JL .. .. 




c : .: . 

■ . ' 


6 


£m %j • . ncal Lii. cliiu oaic L_y 


'-r 1 
• J- 






1 ... t 


24 Physical Skills 1 . 


17. " 


; 11 ' 


4 


4 


*?^' ^Tir> t*1" ^maii i T> .••■»*•■ 

^ opui U O Jlldl I O 1 1 J_ jJ ^ 3*' ■ v " « ' 


' 0 ' 




0 






0 


n 


n 


n 

u 


27 Oral -Aural mcill*; 


in 

T 




2 


2 


■ • ■ 

78 WhTrl Rprncm 1 1" i on 


46 


45 


32 


23 


=29 Re ad in p Mechanics r ■ 


14 


17 


16 




30 .v Reading Comprehension ■* 


84- 


.97 


88 . 


91 


RparHria T-TTtsTnTstat ion 


0 • 


-: 2 ' 


11 


i3. 


*52 • Readinp AnDTeciation and Resraonse 


0 


0 


1 : 


1 


■ Rpliaimis ICnriw l pric*p J 

+J+Jm. l\v 1 X c i-vU«J . l\JlvnJ.^Ugv 


. 0 


" 0 ; 




0 


.Oft- tvtJJ. -Lg-I-ULio Del lei » 


n 




n ■ 

■ ' c 


n 


1 ' •'' ' «. : 

. ... * ' » ■ ■ • 


KJ 


0 


2 


if 


36. Scientific Knowledge ""' ■ " 


0 * • 

■ * ' 


1 


8 : 




38. History and Civics ' ; 




0 

1 - 


0 

V 
14 


0 

15 . 


" GencTTanhv ' • . 
40. Sociology 


V 1 ■ ■» 

u . 


2 
u 


11 

i 
1 


11 
1 


4T. Application of Social Studies; v : 


* 

...0 


1 


5 


.5 ■ 


- . Totals 


317 


383 


476 


510 



7 



8 



Table 2 



Number of Elementary Standardized 
Tests Evaluated in 1976 









Grade 




/ . ' 


fiducational Goal Area . •• '" - . 


1 , :. 


2 


3-4 


5-6 ' ' ' : 


■• '■ lK. 


Personal Temperament 

' ■■*.#■ ■ ■ ■ >■ 


174 


171 


- 231 


2 2 0 ; : 


2. 


" 

Socialization 1 

■ 6 . 

, ■ . a ■ 


138 


■ 137 . 


..196 ' 


isi ' . ' ; 


3. 


Attitudes,. Values and Motivation-' 


131. 


126 


206 


156 . 


.4. 


Valuing Art J ' • ...... 


15 


16 


20 


19 ■ 


•5. 


r '»••.'■'. i 

Producing Art " 0 


9 


• 8 


8.' - 
• f 


s- ) • ; 


6. 


Understanding Art . 


0 


■ 1 


0 


1 • 


7. 


Career Values and Understanding 


25 


" 26 


68 


74 


. 8. 


^Understanding and* Reasoning . ? 


' 253 


210 


304 


295 ' 


. ?• 


Creativity and Judgment ^ 


28 


29/ 


. 34 


36' / ' * ■ 

■ / 

• • 


ib. 


; " Memory * 


' 85 V 


75 


90 .' 


75 ' ; : " ' 


. 11. 


. -Foreign Language Skills 


ii 


26 


28 


92 - ; ' 


12. 


' Valuing\a Foreign Language and Culture 


'■ * 
2 


. 3 


• 5 

r. 


5 ' \ 1 


. 13. 


Writing Skills ' 


. 55 .. 


91 


• 176" 


170 


14. 


Reference and Study Skills 

-**■.• - t% m : . \ . ■ ■ 

Understanding Math Concepts 


3 


* 

' . 7 


23,' 


26 


15. 


46 . 


34 


.55 


38 


16. 


Performing Arithmetic Operations 


• 33 


70 


151 


180 


17. 


Applying and Valui^g^ Mathematics 


15 


24 


56 


55 ■ 


• 18. 


'Geometry and Measurement Skills 


6 


6 


24 


29 


19. 


Valuing Music 


0 


1 


9 


• 10 .' 


20. 


Performing in Music and Dance 


0 


0 


0 


0 . • - 


■ 21. 


'Understanding Music ' 


0 • 


o 


52 


63 


-J 2 - 


Sensory Perception 


184 ■ 


124 


138 


119 ' 



) 



O : 

ERIC 



10 



table. 2 (continued) 




Grade 




■ 9 


Educational Goak^Area 


1 


2 


3-4 


5-6 


■■. 23. 


Psychomotor Skills 


192 


162 


171 


• 141 .. ' * 


24; 


"' Sports Skills " 


? 


13 


14 . . 


22 


25.. 


Valllinp Phvsif*??1 Friiiral" i nn 


. r 


" ; 0 


i - . 


» 2 


26. 


♦ Hp&i 1"h HaHl 1" S 5JTVr1 4 In e\ (=4v q 1r a n rl i n rr 

♦l lt'tX J.L-ll liaui. L- J CXJili \Jl l LylCJ. O L cxi ILL -L 1 1 


•10- 


10 

< 


.18 . 


.' 17; ; • *. • 


27. 


Understanding Hazards and Di spa^p«; 




• O.Td ' 

V • 


0 


0 s 


28. 


Readinp Readinpss Skills 


'315 


246 


227 


14 6 • "■■ 


29. 


Familiarity with Li tpratii'rp 1 ■ 

v . • * ■ ' f 


f 
Q 


0 


1 ; 


1 




Reading with -Understanding ■ x . ■ 


11.5 


. 182 


296 . 


226 : . 


31: 


' \Readine Interpretation and Criticism 


'47 


51 


63 


55 . 


32 ."■* 


Valninp T.i t"PTa1"iiTP ht\c\ Tan chip af» \ 

* tX X. CX o » L1JL C CU1L1 J_i CX J 1 H L* CI ti C » 

'.'•■* 


1 ." 


2 


5 ' 




33,. 


Understand in'c Religion 

Personal Ethics* arid Reli pious RpIip-F 

■ • f ' 


0' 


0 


0 . 


2 ■ 


34. 


11 •■■ 


10 


12 




■35. 


9 

Invest i pat inp the Fnvi Toninpn t 


4 


4 ■ 


2 


3 '• .• '. 


36. 


"Under stand in cr Srienre 


3 


3 


11 


•14 ' • . 


\ 37V- 
A / . 


Valuinp and /ADirlvinp Snipnne 

■ t*- J- LA -til £^ US 1 V_l 1 ^L^f kJ A. J Xll g »J V^. J- 1 1 


. 1 


2 


6 


■ 5 




Underst andinp Hisl"OTV anrl Pivirs' * 

u,i Uv JL <J La CXJ IU Xllg 11X J L vl y CXl 1 Li ul V1L3 


.0 


1 


4 


9 




Understanding Georgraphy ® 


5 ■ 


6 ' 


18 


20 




Understanding Social Relationships- 


o 


o 


2 


2 


41. 


Valuing and Applying Social Studies 


13 


14 


'21 


20 ^ , . 




Totals 


1938 

,'v 


1891 . 

S . ' 

9 


2746 


'2552 9127 




• A 








# 




. ■ * ' ' ■ ■' • .. ;^ . • ■ 




* 







■J 

v ,, 



ERIC 



A. 



• . • . . •: • . . v . .-ui ' ■ ' v . : • v ' : - : ' ' • 

Changes, in the Quality \ of Tests / •;- /. ■ 

.To approach the. question of test quality, it was necessapy tb ex*Mne • 
ratings of Instruments on the various., test evaluation criteria. There were 
over 2Q. criteria employed in evaluating tests (24 in 1970, 36 in 1976). and? ' 
,. there" 4 were thousands of tests evaluated, - so some ' simplification was Required 

•.>-v^- ' ;*» ■ ^ • 

-Uo avoid* ""data c^piToad." It w^s decided to concentrate on tests in •.. 

• • ' ' ' \ ■ ' '., . . • ■ < 
several key areas in the affective arid-cognitive educational domains* Eack 

time, the same procedure was used, l^st ratings wer£ .compared for concurrent* 

and predictive validity (combined) . an^for three types of test 'reliability— 

ts£st-retest, inte*rnal consistency arijd alternate form reliability. • 

•■ - : '•■ . • . * : \ \ ■ : -- ? H . ■ ■ ' 

; "* n order to make test evaluations comparable for thQ 1970 and 1976 

^ .. * .■ ■ * ..•»■''' . . * 

data,-some adjustments were made in (the ratings. In 1970, concurjtent an'd 

■■' .. "' v )> ^ % y 

predictive validity ratings', were. madfe using a 0 to 5 scale (racing fro^i 
ff no evidencKe reported'/ to "exhaustive . evidence' 1 )'. " In 1976., there were' r 
separate; criteria each having V / 

0 to 2 scale (the higher a test was rated, the better its quality on ekch - 
criterion). To facilitate comparison v the separate ratings in 1976 were 
added for each test to yield a new combined validity scale "ranging- in Value 

. . *■''■•'*•' ' < • * 

•from (V to 4. The validity criterion designed ^or thp. present study con-, .' . 
tained 4 categories : high, medium, low, and very low . or unreported ..'l These 
reflected thfe following respective quality point designations:- higK—1970, 
4 or 5 points,. 1976, 4 points; medium— 19-70 and 1976, 3 points; low— 1970 . 
and 1976, 2 points; very low or unreported— 1970 and 1976,. 1 or 0 poitits. 
With reliabiluy ratings,, very minor changes were made to make the 1970 ^ and 
197^6 data comparablie. No new scales 1 -were' constructed. % . 

In making the ratings, for r validity and reliability, test evaluators ./. 
s^arcHed. through publisher-supplied information to. arrive at *a judgment. 



/ 



12 



Those tests that cited validity and reliability studies •with, high correlation 
coefficients were giveri' the highest ratings. Medium correlations, (ranging', 
from 70 to. • 90) yielded medium ratings. If no ' studies were reported or if N " 
correlations v^fe^ess^than .70*, the. test was . : rated low* • 

.In 'the jfirst^rea. studied, test evaluations were* cpijipared, at the \ 
upper grade levels in dn important part of 'the' affective domain—the* area, of * 
attitudes, values and motivation (1970, . goal 3; ' Attitudes plus goal 4, Needs- 
and . Interests ;' 1976 , goal 3', Attitudes , Values , and Motivation) .. . This 
educational area covered such topics as attitude toward school*,* self-esteem, . 
and achievement motivation. . Table 3 reveals th v at the" majority of such tests, 
were ratQd low in validity and reliability in both 1970 and, 1976." A few • 
instruments received hi£h ratings in .-'reliability-/. . ... 

The. area of reasoning was a ; part of the cognitive domain that had many 

. • ■ ■ "" ■ ■ f '••/■'.//'''."• ■ '■ ■ ; ' 

V \" fT. V ' ' ■ . ■ • ."■ .' ■ ' • ■ 

tests in the category (1970; goal 8, Reasoning ;, 1976,. goal 8, Understanding 
and Reasoning ) . The area covered instruments that measured skills tradi-^ ... 
tionally ;iricluded in intelligence . tests—mental, abilities such as classifi- 
'cation^ comprehension of infprmation, logical reasoning and spatial -v 
reasoning. It was. determined that there were increases in test quality 



:i|^en 

the absolute, number .of high quality instruments available. . 1 Table 4 shows - 



among instripents in this category^ at least in terms, of slight increases in 



that in 1976, there were more grade 5 and 6 tests with solid validity ^and. 

reliability data than were available in 1970. The greatest increase in. ..the ' 

number" of high quality instruments occurred on the cr^^rion of, internal * 

consistency reliability. The number of instruments with reported^ coeffi- 

cients 'greater than .90 more than doubled.. . . 

The foregoing summaries concerned, areas of great interest to .educators., 
* ■ * ■ • ' • ..•./..;•■ 

•'- • ' / - ' ^ 
but -not usually amenable to- direct educational intervention. However , r " most 



of the educational goal categories aimed at traditional curricular areas 
(i H e., school subjects). For example, ' several goal areas covered mathematics 

/ ■ . \ \ • ' '.. 

Skills,, the,. latter-being a significant part qf every elementary school cum- 
. culunv.' TaMe 5 gives ratings (at -grades 5 and 6) of. tests of arithmetic ; 
operations (1,970, goal i6, Arithmetic Operations ; 1976,. ; goal .16, Performing » 
Arithmetic Operations) .. These categories contained. tests: ofVability to 
perform arithmetic: • computation with who^le numbers, fractions, decimal 

and 'percentages. Few tests at* either year were high in validity or in- test-, 
retest and alternate form reliability. A slight increase did occur for 
instruments with high internal consistency coefficients. , . 

The final educational ^category that was compared 'fcsfr^&hanges in test 
Quality was the ar&a of reading readiness skills. Here, tests aft grade 1 
"were examined. This age level . was choserf since accurate information iabout 
reading readiness, is most useful at the earliest primary school level. The 
' subskiils involved here included listening and speaking ability and word 
attack skills feuch as phonetic recognition (19.70, goal 27, Oral-Aural Skills' 
plus goal, 28, Word Recognition ; 1976, gpal 28, Reading Readiness Skills) . 
Table 6 reveals that not many changes occurred over the _ six year interlude. 
With this educational area, predictive validity- is crucial to test usability, 
but ratings were low on the combined concurrent and predictive validity 
criterion. As with the other goal categories studied, there was a slight 
positive change in internal consistency reliability.. 



Table' 3 



13 



Nwbers/an'dyPtfr(^ta^es of, Te£ts Rated for. 



•■_ v. Validity-. ai^Relidbility 



Tests of Attitudes; Values; and Mot iyatioa, , Grades 5, 6 



Year 



Evaluation Criterion 



1970 



1976, 



Concurrent and predictive validity r : > 




% : 




i. 


■ -High i. •: ';>■ ■./>:.:,> ; " / ■ v r :. A'!- 1 . • 
•. •• ' ■ • . .. >?•'•/■ : ••" ; / ... - . . : •■ .-. ■> 


0 


.0 


■ b 


0 


. ■ • . ^ - ' * •> ■-.-.» 1 ■ r r ; ■ g- ■ 

Medium ' • -7 ;*: : *"\v' ■;■ 


•V- .1. 


3 


0 


0 




1 -J 




0 


■ ■* 0 

• 


.Very low Or Vunrep5rt ed ■ , . 


• 38 , 

M 


94 


156 


y 100 


Test reliability ,\' 

Test-retest coefficient . 

■.•■>■■ v ■ 


• ;'..< '' • 


- 






: ij: > .90 .« f . i .' ■ 


1 


2 


0 


■ u ■■ 




.12.' 


30 


S 4 ' 


• ;. .3 


: r <J ? 70 or unreported 


27 


68^ 


152 


, 97 


Internal consistency coefficient 


r 

> . : . .. : 










0 


0 


1 


'. 1 


^70^ r < .90 


14 


. 35 


: 4 3 




t < -70 or unreported 

• ' f ." V • ■ ' ■ 


26 . 


65 


■152 


' 97 


.Alternate form coefficient 








V 


; : . : r > .90 * ■/ 


0 


0 


.. • • 0 


0 


. ' \70^ r < , 90, . " 


. 12 


30 


0 


0 


• ■ ■ \ ' ' 
v r < . 70 or unreported 


28 


.70 


156- 


100 




1 rr.. 



, Table 4 ' 
Numbers and Percentages of Tests Rated for 
Validity and. Rel lability;..: 

* . iV,i ^ 6 * 



1.4 



Yeir 



9> 



Evaluation Griterioi 



Concurrent and predictive validity 
• • High . . : V % 
Medium 

' Low , ■ 7 •■ , . ■ 

■jit N, Very low or unreported 

Test reliability. 

Test-retest coefficient.. - 



r . 



r > .90 < /■ 

' .70<r < .90, 
. ' ' ^ r < . 70 or unreported. 

Jnternal consistency coefficient 

V .! > - 90 ' 
. 70< r < .90 . 

r.< .70 or unreported 

Alternate form coefficient 

.. .. r 

r > .90 
.70*^ r <>.90 : * 

. x ^ . 70 'or unreported 



9 

•id 

6 , 
64 



6 

,23 
68 



20 
34 
43 



; 5 

' 12 

• • 80 



1970* 



19. 
6 

66 



6 

_24 
70 



21 
35 
44 



5 ■ 

12 

82 



1976 



n_ - 

.16,' 
2 -- 
.52 
225 



.7 
28 

" 260 

55 
47 
193 



13 

•19. 
263 



v5. 
1_ 
18 
76 



9 

89 



19 
(6 
65 



4 . 
6 

90 



ERIC 



Id 



Table 5 



15 



Numbers and Percentages . of Tests Rated for 

Validity and Reliability 
Tests of Arithmetic .Operations, Grades 5, 6 



Year 



Evaluation Criterion 



ConcurxentTaed predictive validity 



1970 



1976 




Very low or unreported 



EL ' 

0 ' 
16 
2 • 
54. 



0 
22 

3.'' 
75 



n j8_ 

3 ' 2 '. 

0 0 

19 " V 11 

158 87 



Test : reliability 

; ; Test -ret est coefficient 

' v r > .90 /.' *• 

* .-; 70 <r < - 90 '■ . • ■ 

; i r < . 70 or unreported 

: - Internal consistency coefficient 

'. ■{ . •; 7 J r > .90 

.7 \< -70< r < .90 . •* ; 

* • ji ; r < . 70 or unreported ' - . 



•2 
11 
59 



22 
23 
27 



3 

15 
82 



31 
31 

.38 



1 
6 

173 



-37 
. 20 

': ' v ... 123 



< 1 

3 

97 



21- 

11 

68 



Alternate. form coefficient - 

r > .90 
:• -70^ r < .90 

: rj < . 70 or unreported 



0 

13, 
59 



0 

18' 
82 



0 

14 

166 



6 
8 

92 



ERJC * 



lit? 



Table 6 * . " . 
Numbers anjd Percentages , of Tests Rated for 
Validity ajid Reliability 
Tests, of Reading Readiness, Grade- 1 



; 16 



'Year 



Evaluation Criterion . ■ ? • 
Concurrent and predictive validity 
High 



Medium 



Low' 



Very -low or unreported . 

Test reliability \ • ' \. 

'■; v - Test-retest coefficient , 

' v r > .9p -, .; : w - .... V : 

. 70<r: <..90 .. 
■ "-^^^^ oi;; unreported : . 

■ Internal consistency -coefficient 

•' ' ; ■'. " l> -90 ■ , .< ■ ■ 

'V.- - 70< r < . 90 , 

t. : ! r < ^ 70 or unreport ed ' 

Alternate form coefficient 

. - .70< r < ,90' 

r '..JQ or ui.irr|M»r ( c'»d 



1970 



n_ 

<*• ■ 
2 

3 ., 
8 ' 
43 



_55 

13 
8 

35 



0 
3 

53 



A 
5 

14 
77 



2 
0 

98 



23 
14 
63 



0 . 

i. 

5 

95 



1976. 



n 



3 
2 

1Q» 



•i 

• 3v 
311 



24 
21 
270 



0 

. 2 v 
.313 



IV 

i 

3 

95 



<i ; : 

•.. i • 

99* 



8 

.7 
85 



0 . 
1 

99 



ERIC 



; y Discussion . v. 

The results showed that the quantity of elementary level standardized 
.tests increased^ greatly during "^Ehe 1.970 -'s * Almost every major educational / 
area n had marked expansion in X the availability 'of 'published instruments. 
Increases, have beeri : proportional The. areas of education well covered by 
♦tests, in 1970 remained wqll covered in 1976 •.l.Unfortunately, the areas poorly 
covered at the beginning of the dtecade remained poorly 'covered at . mid-decade/ 
Areas Vsufch as arts and crafts/ foreign language education, music, science 
and .social- studies' are curricula' without the measurement options they 
d.eserye. The^reajbr reason, for, this, would probably/derive from! the non- | 
traditional or heterogenous character of the^e subjects . Sdme school 
districts do not emphasize these jsufyjects. When the subjects are taught,. 



. . . . •• - ■ • • , • 

different content areas are emphasized in different districts. Without a 

• .r) ... , ■- r. . . • ;. . 1 

uniform approach to subject-matter, ^ test publishers are hard-pressed to 
develop tests that can be relevant to a variety of educational approaches. 



' The results regarding test equality ratings were depressing, if not 
altogether surprising. It would- appear '-that', despite the enormous growth 
in the; number of tests, a parallel growth } in quality 'h as n ot occurred,'. TestL 
of attitudes, reasoning, mathematics, and reading readiness have not shown 
noteworthy growth in quality. -Of the test categories examined in this study, 



the most significant positive changes occurred among tests, of reasoning - , an^ 
areifa that encompassed- TQ'.teSts . i£ is possible that the tremendous public. 
interest in intelligent ^ests in .the early ISTO's (especially regarding 
the testing of minorities) -jhdy -hTaye spurred test developers to ref N ine 
intelligence measures to the greatest' possible extent. Another reason for ^ 
the rise in quality , in this area^myy^lat^.to.its-ion^ histbi^/^it was one 
of the first areas addressed by t.est^inakerp.. There is a vast literature of 



18 



published research and test construction techniques from which authors of .V^' 



new tests can benefit. ' ■ — — - 

. "■ ■ Ther e. We te several limitations tp x this "study. First, it vfas necessary 
to toi\struct a new scale to\make predictive and' concuri^nt validity, ratings* 
/comparable foT 'the two sets of ratings.. This may have act^Tto "penalize^' 
brie set of ratings fsuch an occurrence ; ^i|? Unlikely, but possible.) . A second. 
li^wtf^jLQn concerns differences in test Evaluation prbc6dute$ used in 1970= 1 
jand-1976: Rating criteria were, be ttesr defined and more stringently applied > 
in 1976. This had the effect of requiring very.^onvincing' empirical evidence 
in order for a test' to "be rated high iri validity, ot. reliability: A$ ;a ■ • _> 
result,, ratings for 1976 may have been somewhat higher had some of the .* 

1970 procedure^ been used in 1976. (Of cqursa, the opposite is also, true-- 



The reader should nQte that-some,, if 6nly,a few, high-quality tests ' 



had the 1976 procedures been applied t^ the 1970 data',, the. latter ratings 
" would have j^een— lowest han tljiey were.) •. 

Despite ..these\limitations, there ; is no reason to believe that the 
general findings were substantially affected. ■■ * Regardless of minor differences 
in procedure,, the conclusions remain--there : has freen $n increase^ in the : . 
quantity of elementary level standardized tests and a. negligible increase ; 
* inequality . . • . . ■ .'• • ' " . ■' 

exist . The hundreds of mediocre instruments "on the market should not 
. obscure the good tests- that are available. This sti^dy threw togetTier' highly 
developed instruments with some very poo^ specimens and the reader should 
not lose the . proper, perspective. 



'.Perhaps- the. greatest value of the study was to .reinforce the importance 
. of each te'st consumer. V^arefuily ^considering the quality of a test before it 
^ Is purchased. As a rule, poor tests outnumber good tests. Thi£%is true, 



> 19 



• 1 rega'rilless of 4 the grade level or educational area of the -test. For example'; 
^~ ^ "study bf~secondary^evel dri press^— has^revealed-£indiag$^_^ 

- basically congruent with the present study^ An, astute user of tests s.hould 
Gpngid^r^.re levant research and technical ^lipporting information before — ' 
makihg^an expensive committment to purchase a particular measurement .d^v^^j^. 



0.. 



4\ ■ 




C I 



ERIC 



21 



20 



References 



HoepfnerT~R77^ V.—Nrr Hunter ,-R. , Sp art a S . , Grothe v 

C y R., Shani, E. , Hufano, L. , Goldstein, <E. ,. Williams, R. S. , § Smith, 
Kl. .0 . CSE elementary school test evaluations . Los Angeles : Center for 
the Study of Evaluation,.. University 'of California, 1976. • 



Ho^pfner, R. , Strickland,. G. , Stangel , G.*, Jansen- P., § Patalino, M. CSE 
elementary school, test evaluations ;.; Los Angeles: Center for the Study • 
of Evaluation, University .of California, 1970). .. . : ' * . . r ' 

Petros^o, J. M. . The quality of standardized high school mathematics tests. 
Journal for Research in Mathematics.. Education,' in press. . t 



Acknowledgments 



This study was based on data obtainied from' a^'pi^ect supported by the. 
National Institute of Education (contract NE-C-O.p^-3-0096) ., Conclusions do 
not necessarily reflect the views of that agency. ; The author wishes ^to .. 
thank Ralph. Hoepfner of the System Development Cofporatio^y and Adrianne 
Bank and Russell Hunter of the Center for the Study of EvafiiatiSn. < 




ERLC 



lO. 



